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Abstract 

Background: The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling 
of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and 
folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA 
functions in human health and our ability to design RNA-based therapeutic strategies. 

Resu/ts:Jhe Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, 
(b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics 
(heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates 
an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free 
energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained 
RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the 
motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom 
energy minimization. 

Conclusions: The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The 
web server and the source codes are freely accessible for public use at "http://rna.physics.missouri.edu". 
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Introduction 

The increasing discoveries of noncoding RNAs demand more 
than ever the information about RNA structures [1—5]. However, 
laborious, time-consuming X-ray crystallographic or NMR 
spectroscopic measurements alone cannot catch up the pace with 
the rapidly increasing number of biologically significant RNAs 
such as noncoding regulatory RNAs. As a result, RNA structural 
genomics cannot just rely on the experimental determination of 
the structures. This underscores the request for accurate compu- 
tational models of RNA structure prediction. 

RNA structures can be described at the two-dimensional (2D) 
and three-dimensional (3D) levels, respectively. A 2D structure is 
defined by the base pairs contained in the structure. Helices and 
loops, as defined by the base pairs contained in the structure, can 
be diagrammatically depicted by an RNA 2D structure. The 2D 
structure of an RNA provides the structural constraints to the 
formation of the 3D structure [6-9] , where helices and loops are 
assembled in the 3D space. RNA free energy landscape can have 
multiple free energy minima [10-13]. Therefore, an RNA can 
often adopt multiple stable and metastable structures. 

Computational prediction for RNA 2D structures falls into two 
general categories [14]: sequence comparison [15—18] and free 



energy minimization [19-25]. Sequence comparison-based meth- 
ods rely on base covariation and can usually only infer the 
information about the canonical base pairs. The inclusion of non- 
canonical base pairs can cause covariation analysis much more 
convoluted [26]. However, non-canonical base pairs such as 
mismatched base pairs in the loop regions, may be crucial for 
folding stability and 3D structure folding. For example, non- 
canonical base pairs can influence the loop and junction structures 
and thus play an critical role in determining helix orientations. 
The accuracy of computational prediction is usually better for 
methods that consider "fold recognition" [27]: structure is usually 
more conserved than sequence and the functional core regions are 
usually more conserved at all levels. Therefore, computational 
methods are highly useful and reliable for structures with known 
homologous folds or structures with sufficient auxiliary structural 
data. However, these methods depend on the availability of 
homologous sequences, which significantly limits their applicabil- 
ity- 

Structure prediction algorithms based on free energy minimi- 
zation search for the structure or suboptimal structures with the 
lowest free energy from an ensemble of possible structures. Most of 
the algorithms employ the same empirical thermodynamic 
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parameters (the Turner parameters [28]) for the different 
secondary structural elements based on the nearest-neighbor 
model. However, unlike the entropy (free-energy) parameters for 
simple loops (hairpin, bulge, and internal loops), which have been 
determined from thermodynamic experiments [28], quantitative 
understanding of many other interactions remains very limited. 
Moreover, because of the possible conformational coupling 
between the loops, the loop entropies are not additive for tertiary 
motifs such as loop-loop kissing contacts [29,30]. For such cases, 
thermodynamic experiments alone are not sufficient to directly 
provide loop entropies and free energies due to the complexity of 
the problem. 

Current RNA folding algorithms for 3D structures are generally 
limited to simple (short) structures. Further development of the 
models is hampered by several challenges including conforma- 
tional sampling and evaluation of the energies for the tertiary 
contacts. Combined with discrete molecular dynamics (DMD) 
[31], coarse-grained approaches [31-33] can be used to predict 
structures as well as folding mechanisms with knowledge-based 
potentials derived from known structures. Structure assembly 
approaches [26,34-36], based on the assumption that 3D fold can 
be recognized by the alignment of sequences and secondary 
structure patterns, have shown promising results in RNA 3D 
structure predictions. However, one of the common limitations to 
the structure assembly approaches is the degree of divergence of 
the fragment library [37,38]. 

The recently developed Vfold model is a statistical mechanics- 
based RNA folding model [36,39-42] that can predict RNA 2D 
and 3D structures as well as RNA folding thermodynamic 
stabilities from RNA sequence. In this report, we briefly describe 
the underlying algorithm and the practical usage of a web server 
for the Vfold model (http://rna.physics.missouri.edu). The server 
provides predictions for the structure and melting thermodynamics 
for user-provided RNA sequences. The results from the server, in 
combination with experimental data, may offer useful insights into 
RNA structure and function. 

Methods 

The Vfold model was first reported in 2005 for RNA 
secondary structure prediction [39]. Since then, the model has 
been extended to predict the structures and folding thermody- 
namics of H-type pseudoknots and RNA/RNA complexes [40— 
42]. Furthermore, Vfold was developed to predict 3D all-atom 
structures using a physics-based de novo method [36] . Below we 
describe several unique features of the Vfold model. The detailed 
underlining algorithms can be found in the published papers 
[36,39-42] and in the Supporting Information (file Data SI) of 
this paper. 

Features of the Vfold algorithm 

One of the unique features of the Vfold model for 2D structure 
(base pairs) prediction is its ability to compute the RNA motif- 
based loop entropies. Using the virtual bonds to represent the 
backbone conformations, the model samples fluctuations of loops/ 
junction conformations in the 3D space through conformational 
enumeration [39] (see Figure SI in Data SI for details). By 
calculating the probability of loop formation, the model can give 
the conformational entropy parameters for the formation of the 
different types of loops such as pseudoknot loops. The model has 
the advantage of accounting for chain connectivity, exclude 
volume and the completeness of conformational ensemble. Studies 
by us and other groups show that an accurate entropy parameter 



improves the prediction of RNA secondary structures and 
thermodynamic stabilities [39-43]. 

Another notable feature of Vfold model is its ability to model 
intraloop mismatched base pairs for RNA loops (see Figure S2 
in Data SI for details). By enumerating all the possible 
(sequence-dependent) intra-loop mismatches, the Vfold model 
can partially account for the sequence-dependence of the loop 
free energy. Therefore, the Vfold-predicted loop free energy is 
not only loop size-dependent but also sequence-dependent. 
The model provides a unique tool for predicting many 
important information that cannot be obtained through 
traditional methods. For example, the model can calculate 
the dramatic decrease in loop entropy upon the formation of 
mismatched base pairs in a loop. The model can predict the 
populational distribution of the different loop conformations 
that contain the different intra-loop mismatches. The predict- 
ed mismatched base pairs provide constraints to otherwise 
flexible loop structures. 

For a given 2D structure, the Vfold-based 3D structure 
prediction method [36] searches for the appropriate template for 
each loop/junction in the structure, and assembles the 3D 
template structures into a scaffold for further structure refinement. 
In comparison with other template-based (structure assembly) 
methods such as FARNA/FARFAR [34,35] and MC-Sym [26], 
which sample structures from small fragments of the known RNA 
structures, the Vfold-based method uses motif-based instead of 
fragment-based templates. The main advantage of the multi-scale 
approach used in the Vfold 3D modeling [36] is that the virtual 
bond tertiary structure as the initial state may already lie in the 
free energy basin, so the structure refinement can avoid large 
structural rearrangements for the effective prediction of the final 
native structure. 

Energy parameters 

The Vfold model provides pre-tabulated entropy parameters 
(available in the Vfold web server) for hairpin loops [39], internal/ 
bulge loops [39], H-type pseudoknots with/without inter-helix 
junction [40,41] and hairpin-hairpin kissing motifs [42]. For free 
energy-based RNA structure modeling, the predicted structures 
and thermodynamic stabilities could be sensitive to the choice of 
energy parameters. Therefore, the server provides predictions 
based on two different sets of the thermodynamic parameters for 
base stacks, including mismatched base stacks: (1) from the Turner 
parameters 04 version [28], and (2) from the MFOLD 2.3 version 
[20]. 

3D template library 

To construct the template library, Vfold classifies all the known 
structures into different motifs, such as helices, hairpin loops, 
internal/bulge loops, pseudoknots, N-way junctions (N>3) (see 
Figure. 1). The motif-based template library was built from 2621 
PDB structures, including all the PDB entries released before 
January of 2014. It includes RNA-involved complexes except 
RNA/DNA hybrids. The redundant templates for those with root 
mean square deviation (RMSD) < 1.5 Afor the same motif, same 
size and identical sequence are removed. The complete list of the 
non-redundant 3D template list can be found in the Vfold web 
server. 

Results 

The Vfold server contains three parts: (a) Vfold2D predicts the 
RNA 2D structure (pseudoknotted or non-pseudoknotted) from 
the sequence, (b) VfoldThermal predicts the melting curve 
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Motif diagram 


Motif name 


Templates 




Helices 


Built with A— form helices 


Q 


Hairpin loops 


2366 templates 




Internal/bulge loops 


3260 templates 




3-way junctions 


820 templates 


0 


n-way junctions 


506, 222, 49 and 61 templates 
for n = 4, 5, 6 and 7, respectively 




H-type pseudoknots 


56 templates 




Open junction (two helices) 


J3U11L 1IUII1 LUd.Ald.liy SldLKCU llcllCcS 1U1 Lj — u, 

No templates for L > 0. 




Open junction (multiple helices) 


No templates 




Single stranded tail 


No templates 



Figure 1. RNA 3D template structure database. 

doi:1 0.1 371 /journal. pone.01 07504.g001 



(folding thermodynamics) from the sequence, and (c) Vfold3D 
predicts RNA 3D structure for a given 2D structure and the 
sequence. The computational time scales with the chain length N 
as 0{N 6 ) and the memory scales as 0{N 2 ) for Vfold2D and 



VfoldThermal. To avoid long computational time, the current 
version of the Vfold server restricts the RNA sequence length up 
to 140 nts. 



Vfold2D: Predicting RNA 2D structures 



To avoid long computational time, we restrict the sequence length <=140 nt, The thermodynamic 
parameters for base stacks, including mismatched base stacks, are from the Turner parameters (04 
version) OR the MFOLD (2.3 version) and the loop/junction entropies are from the Vfold model. 



( 1 ) Temper alrfjre 1 2 5 

(2) Energy paramters for base stacks, including mismatched base stacks: 

1 - 1 from the Turner parameters (04 version) 
'•" from the MFOLD (2.3 version) -^^^^ 



(1) Vfold2D 



(Jggcc 



GGCGUUUUCGCCUUCGGGCGAUUUUUAUCGCU 



:GCU^ 



(4) Conformational space: 

^^ ^xclu ding pseudoknot^^ 

O including pseudoknots with inter-helix junction length s 1 nt (this may take longer than the pseudoknot-free calculation) 
O including pseudoknots with longer inter-helix junctions (this may take much longer than the pseudoknot-free calculation) 



(5) Job name: [test 



(6) Your email address [optional, you will receive the results by email, if provided): 
[example@maiLr.om ] 



| Fold it | | Reset | 



" i-j P„- 

1 G-C 12 0.22 

1 G-U 32 0.76 

2 G-C 11 0.22 

2 G-C 31 0.78 

3 C-G 10 0.22 

3 C-G 30 0.78 

4 G-C 9 0.22 
4 G-C 29 0.77 
5U-A27 0.01 
5U-U28 0.66 
6U-U26 0.01 
6U-A27 0.09 
8U-A21 0.78 

9 C-G 20 0.78 

10 G-C 19 0.78 

11 C-G 18 0.78 

12 C-G 17 0.78 

17 G-U 32 0.22 

18 G-C 31 0.22 

19 C-G 30 0.22 

20 G-C 29 0.22 

21 A-U 28 0.22 
22U-A27 0.21 



(3) helix probabiliy 




Helices 



Most probable 2D (78%) 



HI: 0.77 


H2: 0.78 


1-32 


8-21 


2-31 


9-20 


3-30 


10-19 


4-29 


11-18 


H4: 0.22 


12-17 


17-32 




18-31 


H3: 0.22 


19-30 


1-12 


20-29 


2-11 


21-28 


3-10 


22-27 


4-9 



Alternative 2D (22%) 




Figure 2. An example of Vfold2D prediction: the input information highlighted in the snapshot of the Vfold2D web server are the 
sequence (32 nts in this example), the temperature (25°C), the energy parameters used for base stacks (from MFOLD in this 
example) the structural type (non-pseudoknotted in this example). (1) Vfold2D gives a list of base pair probabilities P, y (in txt format) 
between nucleotides / and j. For example, the probability of forming G1 -CI 2 base pair is 0.22. (2) The most probable 2D structure is derived from the 
base pairs with P, y >0.5. In this example, the predicted most probable 2D structure (plotted by VARNA in the figure) has the probability of 0.78. (3) 
Vfold2D also predicts all the possible helices from the predicted base pair probabilities. (4) Possible alternative structures can be found from the helix 
and base pair probabilities (~0.22 in this example). 
doi:1 0.1 371 /journal. pone.01 07504.g002 
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VfoldThermal: Predicting RNA melting curves 



To avoid long computational time, we restrict the sequence length <=140 nt. The thermodynamic 
parameters for base stacks, including mismatched base stacks, are from the Turner parameters (04 
version) OR the MFOLD (2,3 version) and the loop/junction entropies are from the Vfold modeL 



(1) Temperature: fraffn JcT^ ] to [100 T °j^) 

(2) Energy paramters for base stacks, including mismatched base stacks: 

O from the Turner parameters (04 version) 
® from the MFOLD (2.3 version) 



(| ^G^UUUUCGCCUUCGGGCGAUUUUUAUCGCU^~ 



(4) Conformational space: 

^^excluding pseudoknots^ 

O including pseudoknots with inter-helix junction length ^ 1 nt (this may take longer than the pseudoknot -free calculation) 
O including pseudoknots with longer inter-helix junctions (this may take much longer than the pseudoknot -free calculation) 



(5) Job name: |test 



(6) Your email address (optional, you will receive the results by email, if provided): 
| example@mail.com 



Fold it I Reset 



-T 
0.5 
1 

1.5 

2 

2.5 

3 



C(T) - 
0.0463899 
0.0463899 
0.0463899 
0.00328919 
0.0282802 
0.0108527 



99.5 0.963028 




Plotted by Gnuplot 



40 50 60 

Temperature 



Figure 3. An example of the VfoldThermal prediction: the inputs highlighted in the snapshot of VfoldThermal web server are the 
sequence (32 nts in this example) with the temperature range of 0°C-100°C, the energy parameters used for base stacks (from 
MFOLD in this example) and the structure type (non-pseudoknotted in this example). From the temperature dependence of the partition 
function Q(7), VfoldThermal gives a list of temperature-dependent heat capacity C(7), with temperature interval of 0.5°C. The eps format of melting 
curve is generated by Gnuplot. 
doi:1 0.1 371 /journal.pone.01 07504.g003 



Vfold2D: Predicting RNA 2D structures from the 
sequence 

The input of Vfold2D is the sequence in plain text form (see the 
snapshot of Vfold2D web server in Fig. 2). The default temper- 
ature for Vfold2D is 37°C. Users have the option to change the 
temperature to other values. Users have the option to use the base 
stacking energy parameters either from Turner's parameters or 
from the MFOLD. Users also have the option to choose the type 
of structures: 



1 . Excluding pseudoknot: Only non-pseudoknotted secondary 
structures are included in the structure prediction; 

2. Including pseudoknots with inter-helix junction length <1 nt: 
All the possible non-pseudoknotted secondary structures and 
H-type pseudoknots with inter-helix junction of length < 1 nt 
are considered in the calculation. It may take a much longer 
computational time than the pseudoknot-free calculations. 

3. Including pseudoknots with longer inter-helix junctions: all the 
possible non-pseudoknotted secondary structures and H-type 
pseudoknots with inter-helix junction of any length are 
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Vfold3D: predicting RNA 3D structure 



To avoid long computational time, we restrict the sequence length <= 150 nt 
Enter ^Qfji^Ti 



< ^GGC^UUUCGCCUUCGGGCGAUUUUUAUCGC^^ 



Input 2D structure (base pairs): 



1 32 

2 31 

3 30 

4 29 

8 21 

9 20 
1Q 19 
11 18 

V 



(a) 



VfokBD 




Predicted 3D 



l/i put 2D structure (base pairs): 



1 12 

2 11 

3 10 

4 9 

17 32 

18 31 

19 30 

20 29 

21 28 

22 27 

y 



No result 



VfokBD 



(b) 




1 w w 32 
No template available 



Figure 4. An example of the Vfold3D prediction: the snapshot of Vfold3D web server highlights the input sequence (32 nts for this 
example) and the 2D structures as defined by the base pairs, (a) For the most probable 2D structure shown in Fig. 2, Vfold3D predicts 3D 
structure based on the templates from the known structures, (b) For the predicted alternative structure shown in Fig. 2, Vfold3D cannot predict the 
3D structure due to the lack of the available template for the single-stranded chain between the helices. 
doi:1 0.1 371 /journal. pone.01 07504.g004 



considered in the calculation. The computation may take much 
longer time than the calculation with pseudoknots of inter-helix 
junction length < 1 nt. 

The Vfold2D server generates three files: 

1. Base pair probabilities (in txt format). 

2. Probabilities for the formation of the possible helices (including 
the native and alternative helices) (in txt format). 

3. Predicted 2D structures (in eps format) plotted by VARNA 
[44]. 

We recommend users to consider the possible alternative 
structures from the base pair probabilities and helix probabilities 
(the first two output files above). 

Fig. 2 shows an example of Vfold2D prediction for a 32-nt 
sequence [45]. With conformational sampling for the non- 
pseudoknotted structures, Vfold2D predicts the possible (including 
the alternative) helices from the base pair probabilities based on 
the premise that base pairs (helices) in the same structure have the 
same level of probabilities of formation. The dominant 2D 
structure is identified from the base pairs of the largest probability. 
Fig. 2 shows an RNA that has two sets of helices. One set shown in 
magenta has the probability of 0.78. This is the most probable 
structure. Another set of helices in cyan with probability 0.22 gives 
an alternative structure. The predicted bistable structures agree 
with the NMR results [45] . 

VfoldThermal: predicting RNA melting curves 

VfoldThermal predicts the heat capacity C(T) melting curves 
from the temperature-dependence of the partition function Q[T) 
for the conformational ensemble chosen by the user. The server 



provides the results in text format as well as in eps format plotted 
by Gnuplot. The input of VfoldThermal is the same as those for 
the Vfold2D, except for the temperature range in VfoldThermal 
(see the snapshot of VfoldThermal web server in Fig. 3). 

For the example shown in Fig. 3, with the same input as for 
Vfold2D in Fig. 2, VfoldThermal calculates the partition function 
0(7) for all the non-pseudoknotted structures for temperature 
range 0°C-100°C with the temperature step of 0.5°C. The 
predicted heat capacity (melting curve) shows two peaks around 60 
and 90°G, respectively. The peaks correspond to the melting of the 
two helices in the predicted structures in Fig. 2, respectively. 

Vfold3D: Predicting RNA 3D structure 

The input data of Vfold3D are the RNA sequence and the 2D 
structure (base pairs) (see the snapshot of the Vfold3D web server 
in Fig. 4). The output of Vfold3D is a PDB file for the predicted 
all-atom 3D structure(s). Because the current version of Vfold3D is 
template-based, no 3D structure will be predicted if a proper 
template cannot be found. 

Currently, due to the limited structural template database, the 
current version of Vfold3D can only predict the 3D structures with 
hairpin loops, internal/bulge loops, N-way (2<N<8) junctions 
and pseudoknots. For example, as listed in Figure. 1, there is no 
templates available for the open motifs (single strand tails and 
tandem helices except for coaxially stacked helices). Therefore, it is 
recommended to remove the single strand tails before submitting 
jobs to Vfold3D. With the increasing number of the known RNA 
structures, the larger and more divergent pools of the known loop/ 
junction structures with the different types and different sizes 
would lead to better predictions from the Vfold3D. 
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For the RNA in Fig. 2, Vfold2D predicts two alternative 2D 
structures. As shown in Fig. 4, for the most probable 2D structure, 
Vfold3D predicts one 3D structure. For the alternative 2D 
structure, which consists of two hairpins connected by a single- 
strand loop, Vfold3D yields no 3D structure because of the lack of 
the templates for the UUCG single-stranded open junction 
between the two hairpins. 

Vfold output 

Once a calculation is submitted, a notification page containing 
the job information (job name, e-mail address (optional) and the 
job status) is displayed. When the calculation is completed, the 
Vfold web server sends out an e-mail (if provided) notification with 
the predicted results attached. It is recommended to bookmark the 
job-specific notification page for later check of the job status and 
for downloading Vfold predicted results, since Vfold2D and 
VfoldThermal might take a long computational time (hours or 
even longer) depending on the sequence length. An online 
README file about the interpretation of the Vfold predictions 
is available on the Vfold web server. 

Conclusion 

The Vfold package is developed to predict RNA structures and 
folding thermodynamics. The web server will be updated 
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continuously with the development of new Vfold-based algorithms 
for RNA folding. In the future development, we plan to add 
structure predictions for the formation of RNA-RNA complexes. 
We will also add the effect of the ion-dependent electrostatic free 
energies and the heat capacity effect, which can cause the 
temperature-dependence of the enthalpy and entropy parameters 
for the loop and base stack formations, to the melting curve 
calculations and structure predictions. 
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